Biological interaction: Genetic factor(s) and environmental factor(s) participate in the same causal mechanism (Rothman et al., 2008)
Statistical interaction using linear regression (unrelated individuals):
\(y = \mu + \beta_g x_g + \beta_e x_e + \beta_{int} x_g \times x_e + e\)
(Aschard et al., 2012, HumGen)
| Consortium | Sample size | Exposure | Outcome | Reference |
|---|---|---|---|---|
| CHARGES + SPIROMETA | 50,047 | Smoking | Pulmonary function | (Hancock et al., 2012) |
| SUNLIGHT | 35,000 | Vitamin D intake | Circulating Vitamin D level | (Wang et al., 2010) |
| GIANT | up to 339,224 | Gender | Anthropometric traits | (Heid et al., 2010) |
| … |
Example: G x smoking in pulmonary function outcomes (Hancock et al., 2012)
Findings: three novel gene regions
Figure: G x ever-smoking in FEV1/FVC (Hancock et al., 2012)
Abbreviations: FEV1, Force Expiratory Volume in 1 second; FVC, Force Vital Capacity
Example: G x gender in the Genetic Investigation of Anthropometric Traits (GIANT) consortium
Findings:
Abbreviations: WHR, Waist-hip ratio
Stratified framework described in (Magi et al., 2010), (Randall et al., 2013)
Statistical power for interaction tests is much lower than for similar tests of marginal genetic effects (Murcray et al., 2011)
It also faces other potential issues (Aschard et al., 2012):
Relatedness is yet another layer of complexity in GxE analysis,
which impact on the full/stratified GxE frameworks is seldom explored.
Assess the relative performance of GxE methods
in the presence of structure
Methods to account for relatedness are relatively well established
in marginal association studies (GWAS)
\(y = X \beta + g + f + e\)
\(\mbox{where } g \perp f \perp e\)
\(\mbox{implying}\)
\(y \sim (X \beta, \sigma_g^2 K + \sigma_f^2 F + \sigma_r^2 I) = (X \beta, V)\)
\(X \beta = \mu + \beta_g x_g\)
\(X \beta = \mu + \beta_g x_g +\)
\(\mbox{ } \mbox{ } \mbox{ } \mbox{ } \mbox{ } \beta_e x_e + \beta_{int} x_{ge}\)
(Lynch and Walsh, 1998)
\(\hat{V} = \hat{\sigma_g^2} K + \hat{\sigma_f^2} F + \hat{\sigma_r^2} I\)
\(\hat{\beta} = (X^T \hat{V}^{-1} X)^{-1} X^T \hat{V}^{-1} Y\)
\(var(\hat{\beta}) = (X^T \hat{V}^{-1} X)^{-1}\)
Simplify to a one-covariate model by orthogonalization
\(y^*\), centered \(y\)
\(x^*_g\), centered \(x_g\)
\(var(\hat{\beta}_g) = ({x^*_g}^T \hat{V}^{-1} x^*_g)^{-1}\)
\(y^*\), centered \(y\)
\(x^*_{ge}\), centered \((x_g - \mu_g) (x_e - \mu_e)\)
\(var(\hat{\beta}_{int}) = ({x^*_{ge}}^T \hat{V}^{-1} x^*_{ge})^{-1}\)
The power as a function of the non-centrality parameter (NCP)
\(NCP \approx \beta^2 tr(\hat{V}^{-1} \Sigma_x)\)
| Data | Distribution |
|---|---|
| outcome | \(y \sim (X \beta, V) = (X \beta, \sigma_f^2 F + \sigma_g^2 K + \sigma_r^2 I)\) |
| predictor | \(x \sim (\mu_x, \Sigma_x)\) |
In related individuals
Data simulation of the whole sample (nuclear families):
| structure | \(\Sigma_y\) = \(V\) | \(\Sigma_x\) | \(NCP \approx \beta^2 tr(\hat{V}^{-1} \Sigma_x)\) |
|---|---|---|---|
| unrelated | \((\sigma_g^2 + \sigma_r^2) I\) | \(\sigma_x I\) | \(\beta^2 \mbox{ } 2pq \mbox{ } N\) |
| genetically related | \(\sigma_g^2 K + \sigma_r^2 I\) | \(\sigma_x K\) | \(\beta^2 2pq \mbox{ } tr((\hat{\sigma}_g^2 K + \hat{\sigma}_e^2 I)^{-1} K)\) |
Analytical results = Simulation results
(Visscher et al., 2008)
But our formula allows us to explore further performances across various study designs
Data simulation of the whole sample (nuclear families):
| structure | \(\Sigma_y\) = \(V\) | \(\Sigma_x\) | \(NCP \approx \beta^2 tr(\hat{V}^{-1} \Sigma_x)\) |
|---|---|---|---|
| unrelated | \((\sigma_f^2 + \sigma_r^2) I\) | \(\sigma_x I\) | \(\beta^2 \mbox{ } 2pq \mbox{ } N\) |
| shared environment | \(\sigma_f^2 F + \sigma_r^2 I\) | \(\sigma_x I\) | \(\beta^2 2pq \mbox{ } tr((\hat{\sigma}_f^2 F + \hat{\sigma}_e^2 I)^{-1})\) |
Analytical results = Simulation results
Marginal analysis
GxE interaction analysis
Ongoing work
| Stratas | Stratified interaction test | Reference |
|---|---|---|
| Idependent | \(Z_{int} = \frac{\beta_m - \beta_f}{\sqrt{\sigma_{\beta_m}^2 + \sigma_{\beta_f}^2}} \sim \mathcal{N}(0, 1)\) | (Magi et al., 2010) |
| Related | \(Z_{int} = \frac{\beta_m - \beta_f}{\sqrt{\sigma_{\beta_m}^2 + \sigma_{\beta_f}^2 + r \sigma_{\beta_m} \sigma_{\beta_f}}} \sim \mathcal{N}(0, 1)\) | (Randall et al., 2013) |
\(r\) is the spearman correlation between the two tests
Data simulation of the whole sample (nuclear families, shared environment):
Output: \(\rho = 0.167\) between stratas
Bear in mind the results from the LD score regression for two outcomes (Bulik-Sullivan et al., 2015)
\(E[Z_{1j} Z_{2j}] = \frac{\sqrt{N_1 N_2} {\rho}_g}{M} l_{j} + \frac{N_s \rho}{\sqrt{N_1 N_2}}\)
Previous studies reported
The project aims at leveraging the ancestry information in GxE tests
Planning
| Study design 1 | Study design 2 | Study design 3 | |
|---|---|---|---|
| Sample | Family-based | Population-based | Population-based |
| Relationships | Kinship | GRM | |
| Method | Linear mixed models | Linear models | Linear mixed models |
GxE in study design 3 is our ongoing work (not presented today)
GxE in study designs 1 vs. 2 (today focus)
Given: a population of 50,000 related samples (nuclear families)
Experiment: pool 5,000 unrelated samples or pool randomly
| relatedness | \(V\) | \(\Sigma_x\) | Normalization |
|---|---|---|---|
| unrelated | \(\sigma_g^2 K + \sigma_r^2 I = (\sigma_g^2 + \sigma_r^2) I\) | \(\sigma_x I\) | \(\sigma_g^2 + \sigma_r^2 = 1\) |
| genetically related | \(\sigma_g^2 K + \sigma_r^2 I\) | \(\sigma_x K\) | \(\sigma_g^2 + \sigma_r^2 = 1\) |
The Genetic Analysis of Idiopathic Thrombophilia 2 (GAIT2) Project
Developed tools for analysis of family-based samples